Overview
we release data about demographic information and outliers of communities of interest (people sharing an interest, a passion, or a profession).
Identified from Wiki-based sources, mainly Wikidata, the data covers 7.5k communities, e.g., members of the White House Coronavirus Task Force, and 345k subjects, e.g., Deborah Birx.
We release subject-centric and group-centric datasets in JSON format.
To access the dataset, click on Downloads.
This web portal offers 3 interfaces:
- Search for demographics and outliers by topic: From a list of 68 topics and subtopics, you can select a topic, such as Music, and we would return a summary of communities of interest about Music such as members of a certain band or winners of a certain musical award. Each row tells you about the recorded members of this community (in the knowledge base Wikidata), their most shared characteristics (demographics), and sample outliers (members who do not share some of these characteristics).

- Search for demographics and outliers by community: Using the auto-completion feature, you can lookup more than 7000 communities of interest covering 68 topics, and we would return a summary of the chosen community.

- Entity Summarization: Using the auto-completion feature, you can search more than 300000 people (Wikidata items), and we would return interesting and perhaps unexpected information about them.

To access the full data of interfaces 1) and 2), download the
group-centric dataset. For 3), download the
subject-centric dataset.
Ilustration of use cases
-- Detecting under-represented groups:

-- Identifying cultural differences:
Paper
To read more about these and other use cases, check out our
paper.
Contact us
For any questions: contact
Hiba.